Daniel Barth, Nicholas W. Papageorge, Kevin Thom
Journal of Political Economy, 2020, vol. 128, No.4
Barth, Papageorge, and Thom (2020) take aways
.
└── Chromosome
└── DNA
└── Gene
└── Genome
Twin studies
Genome wide association studies (GWASs)
Regress outcome \(Y_{i}\) on each SNP using \(J\) estimating equations:
\[ Y_{i} = \beta_{j}SNP_{ij}+\bfmu'_{j}\bfx_{i}+\epsilon_{ij}, \quad j=1,\dots,J. \]
\(\cov[SNP_{ij}, \epsilon_{ij}]> 0\): Parents see traits which may be correlated with \(SNP_{ij}\) and decide on investments. Environment is endogenous to the trait in SNP regressions.
Polygenic score of \(i\) for the outcome \(Y\) = “Educational attainment (EA) score”
\[ PGS_{i}=\sum_{j=1}^{J}\tilde{\beta} SNP_{ij} \]
Use Bayesian LDpred procedure to correct for correlations in \(\tilde{\beta}_{j}\)
Use all SNPs: Better out-of-sample results than using only SNPs with genome-wide significance \(p\) value \(< 5*10^{-8} =\) .0000005%
PGS is considered to be a predictor of individual fixed effects
Gene-wealth gradient \(\gamma\) with education control
\[ \begin{aligned} W_{i} &= \bftheta'\bfx_{i}+\gamma PGS_{i}+\eta Y_{i}+e_{i},\\ &= \bftheta'\bfx_{i}+\gamma PGS_{i}+\frac{\eta}{J}\left\{PGS_{i}+\sum_{j=1}^{J}\left(\bfmu'_{j}\bfx_{i}+\tilde{\epsilon}_{ij}\right)\right\}+e_{i},\\ &= \bftheta'\bfx_{i}+\left(\gamma+\frac{\eta}{J}\right)PGS_{i}+\frac{\eta}{J}\sum_{j=1}^{J}\left(\bfmu'_{j}\bfx_{i}+\tilde{\epsilon}_{ij}\right)+e_{i},\\ &\simeq \left(\bftheta'+\eta\bar{\bfmu}'\right)\bfx_{i}+\gamma PGS_{i}+\eta\bar{\tilde{\epsilon}}_{i}+e_{i}. \end{aligned} \] where \[ JY_{i} = PGS_{i}+\sum_{j=1}^{J}\left(\bfmu'_{j}\bfx_{i}+\tilde{\epsilon}_{ij}\right). \]
Because \(\cov[SNP_{ij}, \epsilon_{ij}]> 0\), in general, we have \(\cov[PGS_{i}, \bar{\tilde{\epsilon}}_{i}]> 0\).
So, in general, \(\plim \hat{\gamma}-\gamma>0\).
This is problematic: the paper tries to show persistence of positive \(\hat{\gamma}\)
Authors may have acknowledged it: Kept on cautioning that
A partial remedy: Use \(\tilde{Y}_{i}=\tilde{\beta}_{j}SNP_{ij}+\bfmu'_{j}\bfx_{i}\) in place of \(Y_{i}\) on RHS.
This partly takes away the correlation between \(GPS_{i}\) and the error term
Because \(\cov[SNP_{ij}, \epsilon_{ij}]> 0\), \(\tilde{\beta}_{j}\) is overestimated, so deviation of \(\hat{\tilde{Y}}_{i}\) from real \(\tilde{Y}_{i}\) is larger for individual with large \(SNP_{ij}\). Although partly removed, there still remain a positive correlation between PGS and the error term.
2590 HHs, 5701 HH-year observations (Table 1)
our sample comprises households for whom wealth data are most likely to be both accurate and comprehensive.
However, the magnitudes of these differences are similar and relatively modest across alternate samples. Restricting our sample to retired households balances concerns about sample selection and measurement error. (p.19)
Meaning:
Fig 2A
EA score \(\SIM\) wealth
Table 2 also shows EA score \(\SIM\) wealth ($475K, q4-q1), lifetime labor income ($380K, q4-q1)
Fig 2B
EA score \(\SIM\) wealth similar between high schoolers vs college grads, up to q3
Tab 4
EA score \(\SIM\) wealth gradient: .246 (raw)→.070 (+edu)→.047 (+labor income)
Gradient: “savvy” + school quality ← income meas errors
Robust to:
EA score↑
EA score↑
Gene-wealth gradient ⇐ a household’s
For example, it may be that children with lower polygenic scores begin to face challenges at particular ages or struggle to meet specific educational milestones. In that case,we could better target educational policies to help alleviate these road-blocks.
こうした学習の遅れが格差の源泉の可能性、と認識できるようになった